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Abstract. Applications, ranging from tracking molecular motion within cells to analyzing com- 
plex animal foraging behavior, require algorithms for associating a collection of spot-like particles in 
one image with particles contained in another image. These associations are often made via network 
^ ' flow algorithms. However, it is often the case that many candidate association solutions (the output 

' _ ' of network flow algorithms) have nearly optimal scores; in this case, the optimal assignment solution 

^_a^ is of dubious quality. Algorithms for reliably computing the uncertainty of candidate association 

Cn solutions are under-developed in situations where many particles are tracked over multiple frames 

» I of data. This is due in part to the fact that exact uncertainty quantification (UQ) in large associ- 

O , ation problems is computationally intractable because the exact computation exhibits exponential 

^^ dependence on the number of particles tracked. We introduce a technique that can accurately and ef- 

^^ ficiently quantify association ambiguity (i.e., UQ for discrete association problems) without requiring 

,_^ the evaluation of the cost of each feasible association solution. Our method can readily be wrapped 

^vj around existing tracking algorithms and can efficiently handle a variety of 2D association problems. 

The applications presented are focused on tracking molecules in live cells. Our method is validated 
^ via both simulations and experiments. The experimental applications aim to accurately form tracks 

^H and quantify diffusion of quantum dot labeled proteins from in vivo measurements; here association 

^—f problems involving cost matrices possessing hundreds to thousands of rows/columns are encountered. 

r "y For such large-scale problems, we discuss how our approach can efficiently and accurately quantify 

^*, inherent uncertainty in candidate data associations. 

_o 

' ^ Key words. Ambiguity Assessment, Network Flow UQ, Live Cell Microscopy, Single Particle 

'"Y Tracking (SPT) 

I— I AMS subject classifications. 15A15, 15A09, 15A23 

^ 1. Introduction. Tracking multiple point-like objects exhibiting complex (a 

CT\ priori unknown) dynamics in a crowded environment over a sequence of images with 

J^ the intention of inferring statistical kinetic information is of interest to a variety of 

applications. For example, the work presented herein was inspired by advances in 
optical microscopy which have significantly increased the temporal and spatial res- 
^r olution associated with measurements of particles in live cells fl}[3]. The ability to 

accurately monitor fluorescently tagged molecules and/or cells in their native biolog- 
ical environment has enormous potential in addressing open questions in molecular 
and cell biology [lJ|3||8J. In these applications, experiments produce a large volume 
of time-ordered image data that require extensive computational processing before 
biologically relevant information can be extracted [SJItIIs] . Although the remainder of 
^ this work is focused on tracking molecules within cells, the techniques are applicable 

to a variety of diverse scenarios where the goal is to gather temporal statistics on 
multiple objects measured simultaneously in images [9- 11 
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Track formation is necessary before particle kinetics can be inferred from experi- 
mental data. Throughout this article, we use the term "track" to refer to a collection 
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of measurements at different times believed to be produced by the same underlying 



molecule. The quality of tracks formed via association problems 12 can vary sub- 
stantially due to various factors f^lls]. For example, the environment associated with 
the collected data may be cluttered, hence complicating track formation [TIE] or the 
observed physical motion of the particle may be different than that assumed by models 
attempting to forecast particle motion. Therefore, computational means for automat- 
ically and efficiently quantifying the uncertainty in candidate tracks provides valuable 
information that can be used for the assessment of track formation. State-of-the-art 



single particle trackers I2]l3]l5]l7]|8 in biological applications focus on extracting the op- 



timal association solution and do not directly quantify the uncertainty in association 
solutions. The uncertainty in discrete association problems is fundamentally different 
than track state uncertainty (which is often approximated via covariance matrices). 

In this article, we introduce a computationally efficient approach for quantifying 
association uncertainty (also known as "ambiguity" fl2]). The approach is illustrated 
on both controlled simulations and experimental data. Our algorithm can efficiently 
quantify the probability of putative associations connecting two distinct frames, i.e., 
questionable tracks are flagged without requiring ground-truth data guidance. This 
output can quantify both frame-to-frame and "gap-closing" associations 8] where 
track segments at different times are stitched together. In biological tracking, this 
has great utility in both pruning questionable tracks and also in identifying physical 
phenomena of interest (e.g., endocytosis or quantum dot blinking [5]) in observed 
experimental data. Another benefit of our algorithm is that it can readily augment 
existing algorithms [5][7|[8J with negligible additional computational overhead. 

The two experimental applications presented focus on reliably forming tracks 
when biomolecules are tagged with quantum dot probes. Quantum dots are attrac- 



tive because they allow one to track molecules for a long time 13 14 , but these 
probes "blink" (hence gap-closing [s] or other schemes are needed to follow particles 
for long times f^). In the first experimental application presented, we quantify how 
the diffusion coefficient of epidermal growth factor receptors (EGFR), tagged with 
quantum dots, varies with spatial position in the plasma membrane of live COS-7 
cells fzl. Accurately quantifying spatial dependence of the diffusion coefficient and 
other kinetic parameters is important in many membrane studies because it provides 
information about the local molecular environment experienced by tagged particles. 
For instance, such information can be used to predict the spatial clustering of mem- 
brane proteins roj. We show how track ambiguity assessment can be used to aid in 
generating reliable tracks (by pruning questionable associations) for this analysis. To 
facilitate algorithmic comparisons, we utilize a publicly available dataset [t] in our 
first experimental application. Here, our method is used to process output generated 
directly from the "multiple-target tracing (MTT)" algorithm presented in Ref. [?]. 

In order to demonstrate practical utility on larger problems, results obtained from 
modifying the MTT are also analyzed to demonstrate our algorithm's performance on 
the same experimental data in a scenario where more feasible solutions are considered 
(i.e., 330 putative tracks are associated to 351 measurements by suitably adjusting 
tunable MTT parameters [t] to relax association gating |15i). Accuracy, timing, and 
computational complexity results are presented. We also study an even larger scenario 
where many voltage-gated ion channels tagged with quantum dots are tracked. This 
larger system demonstrates how the approach introduced can aid in contemporary 
large scale systems studied in biophysics. In this problem, the tracking algorithm em- 
ployed, u-track [8 , requires associating thousands of track segments in the gap-closing 



step of u-track (the problem studied had 39,577 non-zero entries in the 2705x2705 
cost matrix defining this problem). 

In addition, we discuss problems arising in other popular techniques, such as 
Multiple Hypothesis Tracking (MHT) 15-18] and Interacting Muhiple Model (IMM) 



filters [5lp 19 , when attempting to analyze particles exhibiting complex stochastic 
motion where accurate models are unavailable a priori. Large model uncertainty is 
ubiquitous in SPT applications. Particle filters 20|2T and MHT trackers !22i can cre- 



ate biased associations due to assuming inaccurate dynamical models and/or parame- 



ters. Furthermore, the dynamics of a collection of particles can change over time 11 
and hence tracking algorithms appealing to fixed evolution rules can encounter se- 
rious problems. Illustrative examples highlighting the aforementioned problems are 
presented. Techniques for using the algorithm introduced here to improve reliability 
of SPT algorithms in the presence of model uncertainty are also briefly discussed. 

2. Background. Traditional association problems in target tracking utilize a 



variety of discrete optimization routines 15 23 . Quantifying the ambiguity in such 



optimization problems often relies on Markov Chain Monte Carlo (MCMC) meth- 



ods or fc— best assignment solvers 12 . However, fast and reliable UQ methods are 
problematic due to computational concerns outlined in the next section. For crowded 
scenarios with poorly understood dynamical models. Probability Hypothesis Density 
(PHD) filter methods can be leveraged for tracking the number and position of objects 



present at any given frame 24 , but tracking and data association (i.e., tracking mul- 



tiple objects over time) can be problematic with PHD filters if one's goal is to extract 



kinetic information from putative tracks 25 . Constructing high quality tracks for 



kinetic information extraction requires both accurate detection and data association 
algorithms. 

Recently, the use of established computational tools from target tracking to solve 
various association problems arising in the analysis of cell biophysics microscopy data 
has attracted considerable interest. In these applications, reliably forming multiple 
tracks corresponding to numerous physical objects in the field of view of the mi- 
croscope is of high importance [T|-l3ll6]. To follow individual molecules over time. 



researchers in single particle tracking (SPT) leverage target tracking 15 and com- 
puter vision techniques [2|. For example, one of the early SPT approaches aiming to 
extract kinetic information from in vivo quantum dot tagged particles can be found in 
Ref. p] ; the authors employed a 2D linear assignment solvers and refrained from using 
more computationally involved multiple hypothesis tracking (MHT) techniques due 
to perceived computational intractability pi. Established and well-known optimiza- 
tion routines for solving the linear assignment problem, such as the Jonker-Volgenant 
(JV) algorithm |26i, were used in Ref. i5j. The JV algorithm is also currently utilized 
by publicly available MATLAB-based software to find the optimal global association 
solution 1 8 . Other publicly available codes It] overcome computational problems as- 
sociated with tracking many tagged particles by employing aggressive heuristic gating 
rules that effectively divide the large association problem into several smaller indepen- 
dent local problems (brute force optimization routines are employed after aggressive 
gating in Ref. |7 ). More recently, the SPT community has begun developing their 
own variants of MHT solvers 22 M^ Current comprehensive reviews of existing tools 
in SPT can be found elsewhere 12 
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^Thc term SPT is used to denote that individual molecules are tracked over time; many tracking 
methods with the SPT label do simultaneously track multiple particles 



Although practical MHT solvers (capable of tracking multiple objects in real-time 



applications) have existed for some time in target tracking applications 15 17 18 
the use of these tools in SPT applications requires careful thought. For example, in 
biophysics applications, accurate and reliable models describing object motion are dif- 
ficult to determine a priori^ but many advanced tracking algorithms require this input. 
We provide concrete examples illustrating the problems that can be encountered if 
one uses tracking approaches depending heavily on a priori given models. Regardless 
of the type of algorithm used to form tracks, methods for accurately computing the 
association UQ or ambiguity are helpful is assessing the quality of candidate tracks. 
The approach used here can leverage existing linear assignment solvers, such 
as the JV algorithm, or can process the top k scoring association solutions using 
modern network flow optimization tools. In the results reported, we use k-hest linear 
assignment solvers allowing one to process rectangular (asymmetric) cost matrices 



and also consider nodes having infinite capacity 12 23 , 27 . Our new algorithmic 
contribution shows how to utilize the information content in the fc-best solutions 
and the assumed cost matrix to make better approximations of the uncertainty in 
candidate association solutions. We also discuss how the tool can assist in other 
tasks associated with large-scale particle tracking problems. The method we describe 
is indifferent to the level of sophistication of the assignment solver one utilizes (so 
researchers without access to advanced discrete optimization codes can still utilize 
the tools presented). For example, we show how the single best global association 
output of the JV algorithm can be analyzed with the algorithm introduced. 

3. Approach. 

3.1. Problem Setup. The goal is to assign M existing tracks, {T„i}^^i, to N 
particle measurements, {Pn}n=iJ i^ the current image frame. An arc Si is defined by 
a pair of nodes, (Trm, Pm)] arc Si can be assigned a likelihood Li = L{si) based on 
kinetic modeling assumptions [5J^^2]. Typically, 2D assignment routines work with 
log(ii) [12], we refer to the log likelihoods as the "scores". Both the set of all feasible 
arcs and the associated scores are encoded in a cost matrix defining the 2D association 
problem (5ll8Jll2]. A collection of arcs where each particle is either (i) paired to an 
existing track or (ii) assigned to the null- node = 0, defines a hypothesis Hj\ the collec- 
tion of all feasible hypotheses is denoted by T-L. Assigning to a particle measurement 
results in so-called "track initiation" in the tracking paradigms considered here ItIIs] . 
Note that tracks can also be assigned to resulting in a "missed detection" event. 



As is commonly done in tracking 12 , the likelihood of any arc associating to is set 
to the value of one (hence making no score contribution). Assuming associations are 
statistically independent (a common tracking assumption [5|[7{[8|[l2] ) , hypotheses can 
be ranked according to log (L(iJj)) = log ( J|^ ^j:^ L^). Optimization routines are 
typically used to extract the top scoring hypothesis in order to update tracks with 
new particle measurements [5][7J|8||T2] . 

Optimization routines do not address ambiguity in assignments. Vastly different 
hypotheses can have effectively identical scores. In tracking, standard approaches at- 
tempt to compute the exact hypothesis probability, P{Hj) = .f^ ^ \', „ ■, , to quantify 



this problem 12 ; for example, P{Hj) > 0.95 may suggest an "unambiguous associ- 
ation" solution in the problem under study. Accurately quantifying that specific arc 
associations are likely correct (as opposed to estimating P{Hj)) is often more infor- 
mative in biological SPT applications, since multiple hypotheses may contain an arc 
of scientific interest. Denote an arc for which we want to compute the association 



probability by s (we refer to this as the "target arc"); assume that this target arc 
contains two non-null nodes. Define 'H('^^ to be the set of all hypotheses containing 
s and let 'H^*'' be the set of all hypotheses that do not contain s. The arc ambiguity 






can then be written as: P(s) = ^ — ° r lu 

For moderate problem sizes, simple combinatorics makes brute force approaches 
to computing exact arc or hypothesis probabilities computationally intractable or 
problematic. A traditional approach used to circumvent problems introduced by an 
exponential number of members in Ti, is to define a proper subset of hypotheses, 
-^savap g 7^ and then compute PiHA = ^ ^''"'^ ,,„, flil. The hat denotes an 

approximated probability. The analogous arc probability approximation is P{s) = 

— r-;^ ° r^„ , . We label the above truncation schemes for computing PiH^) 

or P{s) as the "Traditional Method" 12]. Our method aims to improve the accuracy 
of P{s) by making better use of the information contained in the cost matrix and 
j^samp ^^ situations where complete enumeration of H is not possible or desirable. 



3.2. The Correspondence Method. The input to the our algorithm con- 
sists of the cost matrix defining the 2D association problem, the target arcs of in- 
terest {st}^i, and a set •^^'^'"p of candidate association solutions. Solutions can 



be generated by various means. This article utilizes fc-best solvers [12l 28 , i.e., 
-^samp ^ {Hi,H2,...Hk} such that L{Hi) > L{H2) > ...L{Hk). The output is 
{P{st)}]~i, where P{st) is an estimate of the exact arc probability, P{st). Ambiguity 
estimates can be computed using only one candidate solution, so the method can be 
utilized by existing trackers where the association solver cannot be readily modified 
or replaced. After presenting the general formalism in this subsection, we illustrate 
the method on a concrete small toy simulation example to clarify the basic intuition 
behind the algorithm in the next subsection (the formalism presented here facilitates 
the method's computational implementation outlined in the Appendix). 

For a given target arc, s, our method partitions the hypothesis space into clusters, 
such that each cluster Cf^. corresponds to a unique Hj e Til^^ (this key feature 
inspired the algorithm's name, i.e., the Correspondence Method). Recall that an 
arc s connects generic nodes a and b; below it is convenient to explicitly represent 
nodes in an arc via ab = s. For each Hj G "H^", the cluster Clf, is defined by: 
{Hj ~ ah — xy + xb + ay : xy E Hj s.t. (xb E A A ay E A)} where A is the set of all 
feasible arcs. That is, cluster Cf^. is generated by swapping ab with each of the other 
arcs in Hj, one at a time. 

For each hypothesis Hj E ^samp ^ .^g determine which cluster Hj belongs to. If 
Hj E Hi", then Hj trivially belongs to cluster Cf^,. If Hj E Uf'^, we let a' = 
adJH (a), where node adjn (a) denotes the node associated to a in hypothesis Hj 
(i.e., the other node of the arc in Hj containing a, or if no such arc exists), and 
let b' = adJH,{b). Then, we have Hj E C^H,-aa'^b'b+b'a'+ab)- However, if b'a' ^ A, 
then Hj — aa' — b'b + b'a' + ab does not exist. In this case, the hypothesis is in a 
singleton cluster. Cluster formation is illustrated graphically in the Appendix. Each 
cluster's likelihood is computed and subsequently re- weighted to generate P{s). The 
re- weighting procedure is outlined next. 



For each cluster C, define the cluster probability: 
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Note that Qs{C) is the fraction of the probability mass in the cluster that consists 
of the hypotheses in H^*^. Next, a weighted cluster sample C'"""^ — {C{Hj)\Hj e 
y^sampy jg constructed where C{Hj) is the cluster containing Hj. Each cluster C € 
Qsamp j^g^g weight w(C) = X]ff GCn«='""p ^(^i)- ^^'' estimate of the association 
probability of s given such a sample C|°™p is: 
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3.2.1. Illustrative Example on a Toy Problem. To concretely demonstrate 
the approach, we illustrate a toy 2x2 association problem where exact computation 



is possible (only 7 feasible hypotheses exist). Figure 3.1 displays the arcs and the 
associated likelihood score for the problem. 




L(.s-2) = 10 











L = 1 
(score for any arc containing a null node) 



Fig. 3.1. Toy problem setup. Likelihoods of feasible arcs displayed. Recall that any arc con- 
taining the null node has a likelihood score of unity, and thus a cost of zero. 



First suppose, that one has access to an optimization routine providing only 
the single best global hypothesis (i.e., k = 1). In this problem. Hi — {si,S2} and 
L{Hi) = 110 = 11 X 10. Recall that the cluster scheme is defined by a given arc Si 
and 7^«"™P. So to compute -P(si), one simply forms the cluster members defined by 
arc si. Figure [A . 1 1 displays all of the cluster corresponding to analyzing si; in this toy 
problem, only two clusters exist. Since si e -ffi, one of the clusters is labeled as C|/ . 
When 7^«'»™P only contains one member, the -P(si) computation is straightforward; 
one simply computes Qsi{C^_^) = ]^]^o_i!;}2+io ~ 0.8333 (the denominator comes from 
scoring the three hypotheses making up cluster C^ ; the members of this cluster are 
shown in F ig. A.l ). Given that only one hypothesis was considered, it can be observed 
from Eqn. [Othat P{si) = Qs^{C'l^^)■ 

The case where fc > 1 hypothesis requires one to utilize the cluster weighting 
scheme. Suppose we have fc = 3; in the toy problem under consideration, one augments 
Hi with H2 = {S3,S4} (L(iJ2) = 12 = 4 X 3) and H^ = {si} {LiH^) = 11). H2 is 
actually a member of cluster C^ (this cluster probability was already computed for 
fc = 1), hence the weight of this cluster coupled with the given fc = 3 top hypotheses 



Table 3.1 
Arc UQ as a function of k for Traditional (Trad.) and the Correspondence Method (CM). 
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increases to w{C'^ ) — 122 = 110 + 12. H^ is a hypothesis containing the target 
arc si and subsequently forms a new chister; from this new cluster, C^ (shown in 

11. 

19' 



Fig. A.l), one can compute the corresponding cluster probability Qsj^{C^, ) 



the numerator comes from L{H^) and the denominator comes from computing the 
sum of scores in all of the hypotheses in C|/ (19 = 11 + 4 + 3+1). Since H^ is the 
only member of cluster C^^ present in the top fc = 3 solutions, the weight of this 
cluster is also L{H^) = 11 

llxii + 122xfr 



122+11 



So ou r me thod's estimate for A: = 3, utilizing Eqn. 3.2 
0.812294. Tabic 
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displays results for all other arcs and k values. 



3.3. Consistency of the Correspondence Method. Our method introduces 
zero bias when 7{*'»™p — %, That is, we show that P{s) is precisely equal to the true 
P{s) when 7^'*'""p — %, In this case, one has: 
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so plugging the above into Eqn. 3.2 shows that P{s) = P{s 



(3.4) 
(3.5) 



3.4. Correspondence Method Summary and Comments. The main ideas 
underlying the Correspondence Method have now been presented. Additional techni- 
cal details utilized in practical computation and a complexity analysis are contained in 
the Appendix. The essence of the idea is to create additional feasible hypotheses (i.e., 
associations solutions) from given candidate solutions and then utilize the original and 
supplemental information to improve arc ambiguity UQ computations by systemati- 
cally combining the information to produce arc ambiguity estimates. The approach 
is guaranteed to be unbiased after enumeration of all hypotheses (as shown above). 
The traditional fc— best method shares this property 12 . However, we provide several 



large-scale examples illustrating why the convergence of the Correspondence Method 



is much faster that the traditional fc— best method |12 . The Correspondence Method 
shows great promise in scenarios where one can only compute a small fraction of the 
top scoring feasible hypotheses due either to time or memory constraints. In situa- 
tions where many of the top scoring hypotheses overlap in multiple arcs, our method 
provides a means for extracting more reliable arc UQ estimates implied by both (i) the 
top k solutions directly outputted by the discrete fc-best solver and (ii) associations 
implied by hypotheses constructed in the cluster formation phase of the Correspon- 
dence Method. The rapid enhanced sampling of hypothesis space afforded by the 
cluster probability computation is the reason for the method's improved accuracy. 
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Fig. 4.1. Two hypotheses connecting frames 19 and 20 (left panel). Results corresponding 
to selecting each hypothesis (middle panel [tracks resulting from selecting Optimal Soln.] and right 
panel [2nd Best]); tracks denoted by lines and measurements by spheres. The two hypotheses differ 
only by arcs si and s-z. Note: Only 3 candidate particle measurements are available to pair with ^ 
tracks; also only a portion of the tracks are plotted (data for frames 1-15 not shown). 



4. Results and Discussion. Figure [4~T] displays the two top scoring hypotheses 
in frame 20 from Ref. iTj's data. Four tracks formed from frames 1-19 need to be 
assigned to three measurements. The top two association solutions differ only by 
swapping which track is assigned to the null node (i.e., a putative missed detection). 
The tracks in this figure were formed using the tracker presented by Serge et al. It]; 
this tracker only considers tracks and measurements in a small local spatial area 
(default parameters were used except the parameter 'Boul e_free| ' [T] was increased 
from 3.5 to 4.0 to reduce the severity of heuristic gating). More specific track and 
measurement information characterizing the algorithm is provided in the Appendix. 

In Tab. |4.1[ we approximate the uncertainty associated with the arcs distinguish- 
ing the top two hypotheses using various methods. In the "local k = 1" runs, we 
utilize the direct output of Serge et al.'s tracker iTj. An advantage of analyzing the 
local association problem (i.e., the formulation using aggressive gating resulting in 
several disjoint local association neighborhoods [t]) is that the local 4x3 problem 



(3- 



hypotheses, so exact arc ambiguity can be read- 



contains only 73 ( = J2a=o L) 

ily computed (see bottom row in Tab. |4.l[ ). The arcs in the best local solution provide 
an accurate estimate of these probabilities whereas the traditional methods do not 
even consider the possibility of incorrect associations (see "fc = 1 Traditional" ) . We 
also used a fc— best solver to construct additional hypotheses to serve as input to both 
the Traditional Method and the Correspondence Method; the enhanced accuracy of 



Table 4.1 
"Traditional" denotes the arc ambiguity estimate discussed in Sec. [3| and "CM" indicates 
Correspondence Method. 
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Truth computed using "Traditional" A;— best estimate [l2| with k = 10* 
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Rate of convergence of -P(si) for 330 X 351 association problem shown in Table 

Fig. 4.2. 
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4.1. Global Performance on Larger 2D Association Problems. Table 144] 
and Fig. |4.2| display results where, in the same frame, we completely relax the aggres- 
sive gating of Serge et al. 's tracker and construct a dense global association problem 
(i.e., to extend frame 19, 330 tracks need to be associated with 351 measurements). 
Note that in the table, the "truth" cannot be computed precisely in this larger prob- 
lem (fc = 10^ results serve as the true arc association probability). Figure 



4.2 



displays 



the rate of convergence for k € [1, 100]. The computational timing results in Tab. 4.1 
(measured on a single core using Matlab and a C++ "mex file"-based implementation 
of our algorithm and the fc— best assignment solver) demonstrate that the global asso- 
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elation problem with track ambiguity computations ean be readily handled. If one can 
easily switch out association solvers, it is clearly worth the effort, but an appealing 
feature of the Correspondence Method is that it can compute accurate association 
uncertainty using standard SPT output without modifying the assignment solver. 

The top association implies that a recently initiated track (initiation point shown 
via red square) is assigned a new measurement whereas a longer living track is unas- 
signed. The second best hypothesis implies that the newly formed track has a missed 
detection in the frame immediately after track initiation implying that the putatively 
confirmed measurement was likely just fluorescence noise (i.e., a false alarm). Decid- 
ing between the top two hypothesis is effectively a "coin flip" given the ambiguity 
scores. Rationally deciding which hypothesis to believe (if any) requires subject mat- 
ter expertise and/or tools analyzing multiple frames of data, such as a MHT tracker. 
This is an example of a case where MHT technology can help; results presented later 
show that MHT trackers do not always resolve the ambiguity issue. In any case, the 
Correspondence Method allows one to automatically flag cases where large ambiguity 
resides with minimal overhead in large scale problems. 

Next, we reanalyzed all the experimental tracks from Ref. [7i. The local tracker 
described above (i.e., the tracking algorithm parameter settings resulting in the ag- 
gressive "local" gating) was used to generate tracks. This output was then used to 
estimate a collection of diffusion coefflcients by fitting a linear stochastic differential 



equation via maximum likelihood estimation (MLE) to each individual track; Fig. 4.3 
plots the estimated 2D diffusion coefficient as a function of space (MLE and technical 
surface fitting details are deferred to the Appendix). In the left panel, 345 tracks 
had their diffusion coefficient estimated (only tracks having greater than 25 time or- 
dered entries were fit by MLE). Any track involved in an assignment whose top arc 
had a net probability less than 95% was flagged as suspicious and the tracks in the 
hypothesis were pruned from the collection; this resulted in removing 121 tracks Fl 
The population statistics of the full vs. pruned diffusion coefficient estimates differed 
substantially, e.g., median 0.0457 vs. 0.0264 firn^/s. This suggests that uncertain 
tracks biased the estimate (e.g., particles received artificially large fluctuations due 
to incorrect associations). Alternatively, this could imply that certain regions of high 
diffusivity are too crowded for reliable track formation. The latter explanation is 
weakened since the similar spatial regions of higher diffusivity can still be observed; 
the magnitude is simply lower when uncertain tracks are pruned. 

In this previous example, we aggressively / conservatively pruned tracks. Any 
track containing an arc possessing P{s) was completely ignored in the downstream 
diffusion coefficient analysis. One can consider alternative pruning strategies; since u- 
track utilizes both frame-to-frame as well as a "gap-closing" (GC) associations M (the 
latter type of association is discussed in the next subsection), these features facilitate 
removing ambiguous measurements from a collection of tracks without necessarily 
sacrificing track length. The Correspondence Method can assist in more carefully 
pruning ambiguous segments of tracks (hence retaining more data) in both phases of 
u-track if minor modifications are made to the algorithm. However, before presenting 
these results, we provide an example illustrating some problems biological applications 

^The default settings of the MTT tracker resulted in 1-4 arcs per local association hypothesis 
(recall this algorithm is intended to divide the problem into many small association problems by 



aggressive gating [7[ |15| ). The difference between results obtained by omitting all tracks vs. only 
those in questionable arcs was found to be insignificant. In the u-track example, the advantages of 
taking the arc view become readily apparent. 
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(a) Accepting all tracks (b) Pruning uncertain tracks 

Fig. 4.3. Estimated spatial dependence of diffusion coefficient. 

face when tracking multiple particles when stochastic models are unknown a priori. 
We also use the example to sketch the ideas behind both phases of u-track [S] . 

4.2. Dangers of Model Uncertainty in SPT Applications. Figure |44] plots 
two simulated sample paths. The paths are both constructed by a single Brownian 
motion process. However, the two paths displayed are constructed using increments 
that are identical in magnitude, but opposite in sign (this is a demonstration of the 
reflection principle discussed in Ref. [29]). Around time 25, the two discretely sam- 
pled paths cross; the left panel plots the truth (measurements from the two separate 
objects are denoted by different symbols). Recall, that the observations are assumed 
to be point objects (the "symbol attributes" shown in the figure are not available to 
the researcher), therefore it is unclear which observations associate to which physical 
object. Furthermore, the researcher does not typically have accurate a priori knowl- 
edge of the data generating process. So in the toy illustrative example described 
here, there is not adequate statistical evidence to confidently determine which mea- 
surements should be associated with pre-existing tracks. Situations encountered in 
practical applications encounter more complex track ambiguity (as discussed later). 
In the right panel, unambiguous segments are denoted by solid lines (these segments 
can only be determined when the position measurements sufficiently separate). 

The frame-to- frame association phase (identical to the track-to-measurement phase 
described in Sec. Isl) of the original u-track algorithm fs] would force an association in 
the scenario shown in the left panel since measurements are close to nominal tracks 
(note: the wrong association would be made). However, an arc ambiguity or more 
advanced tracking algorithm leveraging information in multiple frames Tsl Tsl 22] 



could be used to break (or at least defer) the ambiguous frame-to-frame association 
made near time 25 in the left panel. In the second phase (the gap-closing step) of 
u-track [8], one attempts to join short track segments that do not overlap in time. 
Although we formulated the classic track-to-measurement 2D assignment problem in 
Sec. |3j one can also consider linking measurement-to-measurement or track-segment- 
to-track-segment spanning different times; the likelihood computation changes slightly 
in these scenarios |8j[l5] (in the GC phase, one merely needs to interpret a "particle 
measurement" as a track segment) r\ The goal of GC is to produce tracks containing 



^Merging and splitting events are common in classic [15| and SPT tracking 18 ; in such cases, a 
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measurements spanning a longer time. In u-track, GC associations are also formu- 
lated as a 2D assignment problem [s], so fc— best assignment solvers can readily be 
utilized. To construct the cost matrix of the GC step, end points of short track 
segments are paired with the beginnings of other track segments (feasible solutions 
prevent temporal overlap; an example of a cost matrix is shown in Fig. 
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Fig. 4.4. Illustration of track crossing ambiguity. The left panel displays the true trajectories. 
However, for this data generating process, there is no statistical evidence available for making un- 
ambiguous associations given only a collection of discretely sampled position coordinates. The right 
panel removes the ambiguous regions, resulting in 4 track segments. See text for discussion on this 
type of common problem in SPT (i.e., process noise is commensurate with measurement noise). 



The problem illustrated in Fig. |4.4| is that after the crossing event, there is no 
statistical evidence for stitching segments together even if one has exact knowledge of 
the data generating process. For this example, looking at multiple frames of data SfS^ 



15 22 via more sophisticated MHT approximations does not help in forming reliable 



tracks due to the statistical nature of discretely sampled systems where diffusive noise 
is a major contribution to the signal 29 . To make association decisions, standard 



tracking algorithms assume an SDK or a collection of SDK models characterizing 
all possible motion types that can be encountered 15 . The assumed models may be 



grossly inconsistent with reality; the real danger is that the a priori models can heavily 
influence formed tracks and "silently" introduce systematic biases. For example, if 
some of the models considered encourage straight line motion (e.g., SDKs with a 
constant drift driven by a standard Brownian motion are commonly used in SPT 
applications [5]), then eventually tracks adhering to this enforced structure will be 



4.4 



formed despite the lack of evidence for this motion in the raw data (in Fig, 
for times 10-40, an MHT using multiple frame tracking algorithm [22] considering a 
directed diffusion model in its cost matrix formulation may make grossly incorrect 
tracks after it observes enough of a trend it hopes to see) . 

The type of self-fulfilling prophecy described above introduces dangerous bias in 
scientific endeavors aiming to objectively characterize molecular motion in cells. In 
such applications, particles evolve in a complex, stochastic, time changing cellular 
environment. Each particle likely experiences different forces, friction, and thermal 



merged composite track can contain measurements from multiple physical objects. The UQ methods 
are readily applicable to 2D merging and splitting events, but to facilitate exposition we omit this 
discussion in the main text and retain our definition of a track (i.e., a collection of measurements 
believed to correspond to one physical object.) 
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fluctuations (accurately quantifying these forces is difficult to accurately predict a 
priori [30}|32] ). Crucial parameters, such as diffusion coefficients and measurement 
noise magnitude, can differ dramatically between tracks (and/or over time in a sin- 
gle molecule), but many cost functions assume that these parameters are frozen and 
equally applicable to all particles under consideration. Some trackers attempt to esti- 
mate measurement noise on-the-ffy {fj , but parameters governing particle motion are 
difficult to reliably extract given only measurement data (where multiple associations 
can occur). If one has all of the information required to construct cost matrices, such 



as those formed using particle filters 20 33 , then that information can be utilized 



(this might be the case when one is studying diffusion in a homogeneous medium). 
However, in live cell studies, one rarely has reliable knowledge of the functional form 
of the stochastic models governing the particle dynamics (even if one is fortunate 
enough to have access to a parametric form governing all particles, it is unlikely one 
has accurate and unbiased a priori knowledge on the parameter distribution defining 
the filters for in vivo SPT studies). For scientific applications aiming to extract un- 
ambiguous, objective, accurate models from experimental data, we advocate the view 
that one should assume relatively coarse dynamical models imposing minimal motion 
assumptions (such as pure diffusion with a relatively large diffusion coefficient) in 
track formation. The SPT algorithms demonstrated JTJIS] readily allow tracking with 
minimal modeling assumptions. 

4.3. Extending "u-track" to Include Association UQ. Next, the Corre- 
spondence Method is used in conjunction with the algorithm proposed in Ref. [8]. 
Our algorithm can be used to aid both association phases of the u-track. For the first 
frame-to- frame association phase, one can set a threshold, Pf2F- Another threshold, 
Pgci can be set for stitching segments together taking place in the second gap-closing 
phase. In the former, any association in the optimal solution k = \ having an arc 
ambiguity below Pf2F can be revoked (forcing track initiation). Similarly, in the GC 
phase, arc associations below Pgc can be rejected (this prevents questionable track 
stitching) . The result is a collection of track segments where one is fairly confident in 
the underlying arc associations. By rejecting gap-closing associations, one may pro- 
duce two tracks generated from one underlying molecule, but Pgc^ can be adjusted 
to balance track segment extension vs. association ambiguity. 

4.3.1. Application to Extracting Tracks in Crov^rded Cell Plasma Mem- 
brane Environments. To illustrate the u-track modifications discussed previously, 
we analyze the motion of quantum dot (QD)-labeled voltage gated potassium channels 
lacking the last 318 amino acids of the C-terminus, AC318 Kv2.1 i34j. The trunca- 
tion of the C-terminus has been shown to allow Kv2.1 to diffuse freely on the plasma 



membrane l34| whereas wild-type Kv2.1 displays confined motion 13 34 . The de- 
sired dense labeling in combination with the inherent blinking behavior QDs, presents 
serious challenges to current SPT tracking algorithms. This thereby creates an ideal 
system in which to test our algorithm. By pruning questionable tracks, our approach 
can aid in the analysis of ion channel motion in a complex biological system. 

4.3.2. Experimental Setup. Human embryonic kidney (HEK) 293 cells were 
induced to express 3 /ig//iL of the mutant channel AC318 Kv2.1 as described previ- 
ously |34] . The construct contains an extracellular biotin accepting domain allowing 
the channel to be enzymatically biotinylated and labeled with a streptavidin con- 
jugated QD (Qdot-655, Invitrogen). Cells were imaged in a home-built, objective- 
type total internal fluorescence microscope, under excitation with a 1 mW 473 nm 
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laser line 13 , 35 . Fluorescence was captured in a cooled back-illuminated electron- 
multiplied charge couple device (EMCCD, Andor). Movies were acquired using Andor 
IQ software at a frame rate of 20 frames per second. 



Sparsity Pattern of "Gap Closing" Cost Matrix 
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Fig. 4.5. The left panel displays a single frame of a microscope image where quantum dots 
are attached to a protein (the ion channel Kv2.1 \34\ with a mutation outlined in main text). The 
motion of multiple proteins was tracked in human embryonic kidney cells (the color overlay in the 
left panel displays some of the tracks generated by analyzing 400 frames of data with u-track lov). 
The top right panel displays the sparsity pattern of the "gap-closing" cost matrix employed by u- 
track (there are a total of 2705 track segments that can be connected after the 400 images were 
processed via the frame-to-frame phase of u-track). The rows labeled "End Track Segments" denote 
track indices corresponding to tracks that terrainated before the last frame; the columns labeled as 
"Start Track Segments" indicate track indices of tracks initiated after frame one. A non-zero entry 
in the cost matrix suggests a possible linking due to lack of temporal overlap and adequate spatial 
proximity of the two segments at the start and end points. The bottom right panel displays the 
estimated arc probabilities in the optimal solution falling below the Pqc threshold (here Pqc = 0.99J. 
For each arc in the optimal solution, {s : s a Hi}, we estimated P(s) and subsequently used the 
Correspondence Method and hypothesis Hi to generate the P{s) 's (the plot contains the histogram 
of{P{s):s&Hi}). 



A MATLAB-based tracker augmenting the publication in Ref. Is] was used to an- 
alyze a sequence of 400 images each receiving a camera exposure time of 50ins. Figure 
4.5| displays some representative tracks generated in this experiment. The parameters 
used in the tracking routine are reported in the Appendix. To illustrate the power 
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Fig. 4.6. Histograms of the estimated arc probabilities in the optimal solution falling below 
the Pec threshold (here Pqc = 0.99J. For each arc Si in H\ (subscript denotes optimal solution), 
we estimated P{si). The P{si)'s were computed using two values of k (10 and 100) using both the 
Traditional Method (left panels) and the Correspondence Method (right panels). The k = I case for 
the Correspondence Method is displayed in the bottom right panel of Fig. \4.5\ the distribution of 
the P{si) 's changes in a negligible fashion when k is increased (careful inspection of the histograms 
shows minor differences). The k = 1 case is trivial for the Traditional Method (i.e., P{si) = 1 for 
all i) so it is omitted. 



and practical utility of the Correspondence Method, we replaced the JV assignment 
solver with a modified Murty algorithm capable of processing rectangular cost ma- 
trices. The Murty algorithm subsequently fed the Traditional and Correspondence 
Method fc-best solutions 



12 27 



4.3 



In addition, the procedure discussed in Sec 
using Pec = Pf2F = 0.99 was applied. With the Correspondence Method, setting 
Pf2F = 0.99 and k — 1 resulted in 2658 segments being initiated after frame one and 
2656 track segments terminating before the final frame in the frame-to-frame associ- 
ation phase of the modified u-track i8j. If no UQ pruning was applied, i.e. Pf2F = 0, 
the tracking algorithm has only 2132 segments being initiated after the first frame 
and 2128 track segments terminating before the final frame; revoking questionable 
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frame-to-frame associations clearly increases the number of track segments. The sec- 
ond "gap-closing" phase attempts to put these segments back together. A total of 
2705 segments defined the square GC gap-closing cost matrix in the formulation of 
Ref. [|. 

The sparsity pattern of the square cost matrix defining the resulting gap-closing 
assignment problem is displayed in the upper right panel of Fig. |4.5[ Non-zero 
entries in this cost matrix indicate track segments that (i) do not share measurements 
overlapping in time and (ii) are spatially close enough (an input gating parameter 
f^)to be considered for GC track stitching. Note how the diagonal entries of the 
gap-closing cost matrix are all zero since a track segment temporally overlaps with 
itself (other zero entries are a result of either spatial or temporal gating). Only 389 
out of the 2705 track segments were unassigned in the k = I solution. In this crowded 
dense scenario with high diffusivity, this suggests that many track segments often 
cross and can be assigned with the default par ameters [s] defining the gap-closing 



cost matrix. The bottom right panel of Fig. 4.5 displays a histogram of the P{si) for 
non-null arcs Si € Hi whose ambiguities satisfy P{si) < Pqc = 0-99 computed using 
the Correspondence Method and 7^^'""p — Hi. Of the non-null arcs in Hi, 28.7% are 
below Pgc suggesting that a substantial fraction of track segments stitched together 
via GC are questionable. Note that for fc = 1, the modified Murty assignment solver 
and the original JV solver used in Ref. |8] provide the same association output. It 
should also be mentioned, given set of tracking parameters (often containing many 
tunable gating parameters [7y8[|15] ) , that the Correspondence Method can be used to 
quantify the quality of tracks formed for a given population of particles (the quality 
of tracks depends on many factors, e.g. density, diffusivity of tagged particles, and 
measurement noise). 



Figure 4.6 displays the distribution of P{si), using /c = 10 and k = 100 ob- 



tained using both the Traditional and Correspondence Method. The Corresponde nce 
Method's distribution of P{siys is relatively independent of k (the panels in Fig. 



4.6 



complement the bottom right panel of Fig. 4.5). Using k = 100 and the Correspon- 
dence Method, 28.9% of the non-null arcs in the optimal solution fall below the Pqc 
threshold. Insensitivity of P{si) to k confirms that a significant fraction of tracks 
generated in the second phase of this routine are of dubious quality due to ambiguous 
gap-closing assignments. 

The Traditional method has poor UQ estimates in this large problem for k G 
[1, 100] because many hypotheses have similar scores. There are small numerical dif- 
ferences between the top scoring hypotheses; furthermore, these high scoring hypothe- 
ses only differ by a few arcs. To make things concrete and to highlight why when using 
the Traditional Method to compute arc ambiguities so few P(si)'s fall below Pqcj the 
set difi'erence between the top scoring hypothesis and the four other top hypotheses 
are provided: Hi- H2 = {sa.Sh}, Hi- H3 = {sc,Sd}, and Hi-HA = {sa, Sb,Sc,Sd} 
(the actual arc numbers are uninformative) . In the top solutions, a permutation of 
a small number of arcs are going in and out of the top hypotheses. The likelihood 
ratio of L(iJioo) = 0.9513 x L{Hi). This is suggestive that the field of view of the 
microscope is too crowded using the tracking parameters and algorithm employed. 
The Traditional Method slowly explores the phase space of all feasible hypotheses, 
hence using k = 100, produces a poor estimate of the arc ambiguities. 

The Correspondence Method quickly explores the space of feasible arcs since arcs 
of interest can be systematically broken. The cluster probability computation permits 
one to approximate the relative importance of that arc. If one has the computational 
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resources and time to set k very large, then the Traditional Method's distribution 
would approach that of the Correspondence Method. However, for this large system, 
with k = 100 we pushed the RAM limits of the 8GB computer utilized in computa- 
tions; with the k-hest solver employed (finding the k — 100 best solutions on a laptop 
took under 4 minutes, and the post-processing of the data using the Correspondence 
Method took under 20 seconds). More sophisticated A;— best routines beyond the 
modified Murty algorithm can allow for larger k values in this problem with the same 
hardware, but a major advantage of the Correspondence Method is that one does not 
need to use excessively large k values to obtain reasonable UQ accuracy. We demon- 
strated that the problem of selecting a "large enough" value for k (which plagues the 
Traditional Method [l2]) can be substantially mitigated if one uses the Correspon- 
dence Method. Since k = 1 results are meaningful, one can use off-the-shelf linear 
assignment solvers and the Correspondence Method to assess the quality of tracks 
generated (this feature is practically attractive since not all researchers have easy 
access to state-of-the-art discrete optimization routines). 

5. Conclusions. A new algorithm for processing the uncertainty in data asso- 
ciation has been introduced. The technique forms new feasible hypotheses from given 
solutions to determine the ambiguity (i.e, discrete data association UQ) in arcs of in- 
terest. A procedure for fusing discrete optimization output (provided via any type of 
assignment solver) with inferred information to improve arc ambiguity accuracy was 
demonstrated on small scale control examples as well as large-scale problems. The 
basic intuition behind the method was presented via a variety of examples. Practical 
computational implementation details and a complexity analysis required for better 
understanding large-scale performance is provided in the Appendix. 

The method shows great promise for quantifying uncertainty when there are mul- 
tiple hypotheses having scores that are not well separated (i.e., a single global optimal 
hypothesis does not dominate given the metrics defining the cost matrix) . The method 
is fast, and only introduces marginal computational effort in real-time applications. 
In large-scale problems using off-line computations, memory may be exhausted before 
a good representative sample of association hypothesis can be constructed (hence the 



accuracy of traditional UQ methods can suffer greatly 12 ); in such situations, we 
demonstrated that the Correspondence Method can help in providing more reliable 
association UQ estimates. The algorithm can also readily wrap around various track- 
ers to generate UQ information. In applications where the underlying solver may be 
hard to access or modify, the Correspondence Method can still produce reasonable UQ 
estimates provided only a single association hypothesis (though accuracy improves as 
more hypotheses are provided). There are numerous application domains where such 
technology can be utilized. 

Regarding the SPT biophysics and cell biology applications focused on this paper, 
standard tracking algorithms in this problem domain "z , 3 , 5 , 7 , 8 typically input heuris- 
tic dynamical models (making many questionable assumptions) for creating metrics 
used to define various data association cost matrices. The Correspondence Method 
can aid in screening the vast number of tracks and determine which ones should be 
subjected to more demanding computational processing aiming to more accurately fit 
and test stochastic models characterizing the experimental data. If researchers can 
quantify probabilities of more complex physical events (e.g., endocytosis or molecular 
binding), our method can also be used to identify when these events of biological 
relevance can possibly describe the sequence of images generated by the microscope. 
However, in the biological SPT problem domain, we advocate minimizing a priori as- 
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sumptions about expected particle motion. That is, posit simple stochastic rules that 
can cover all types of motion that can be encountered instead of imposing complex 
models to make initial associations that may introduce systematic bias in the data 
association phase. Once unambiguous tracks (or track segments) are in hand, one can 



consider other time series modeling techniques 30 36] to extract detailed molecular 



kinetic information from the collection of reliable tracks (e.g., using methods such as 



those discussed in Ref. 37 ). In addition, one can consider other stitching mechanisms 
making better use of kinetic information inferred from the time series of unambiguous 
track segments. Regardless if one agrees with our viewpoint, the discrete optimization 
track UQ tools we introduce can be utilized to quantify the ambiguity in tracks formed 
via 2D association solvers. If one prefers using questionable sophisticated models of 
motion in the early stage of tracking (hence potentially biasing results) , our approach 
can still be leveraged to quantify when scenarios become too dense to form reliable 
tracks conditioned on one's prior assumptions. 
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Appendix A. Illustration of Cluster Formation Procedure. 



Defining liypotliesis in cluster C°'^(Hi) = C"^(Hi) (target arc denoted by red-daslied line). 



x 




Hypothesis (ii) in cluster C*i(//i) Hypothesis (iii) in cluster C°i(i/i) 

Fig. A.l. Illustration of forming clusters defined by arc ab. The target arc shown is contained 
in the k = 1 hypothesis (i.e., the global optimum). In this simple 2x2 example, the hypotheses 
above make up one of two possible clusters. The hypotheses in each cluster are formed by swapping 
the target arc with the other arcs in hypotheses containing the arc ab (this arc only appears in two 
hypothesis in this 2x2 problem). The hypotheses above making up one cluster which is denoted by 
C'^^{Hi). Switching the target arc with the other nonzero arc in the problem generates the bottom 
left hypothesis in cluster C'(-ffi); switching with (0, 0) generates the bottom right hypothesis. 
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Defining liypotliesis in cluster C'^^iHi,') = C^{H-i) (target arc denoted by red-dasiied line). 









Hypothesis (ii) in cluster ^^{H^) Hypothesis (iii) in cluster C^^IH^) 




Hypothesis (iv) in cluster C'^(Hs) 



Fig. a. 2. The hypotheses in cluster C^IH^) are formed using the procedure discussed above. 
The target arc shown is contained in the k = 3 best hypothesis. The clusters C^lHi) and C^^lH'i) 
exhaust all possible hypotheses in this problem. 



Figures A.1|A.2 graphically illustrates cluster formation implied by arc "a&" in 
the association hypothesis shown at the top of the figure panel. Arc "a6" corresponds 
to Si in Fig. |3.1| 



Appendix B. Cluster Probability Computations. Here we show how to 
compute Qab{C{H)) given H, where C{H) is the cluster containing H. We first 
consider the case where H G 'Hj,^, so C{H) = C'j^. We define the switch value 
S{ah, xy) of two arcs as follows: 



S{ah,xy) = I ^^"L^y . 

otherwise 
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Each hypothesis H — ab — xy + xb + ay in C^ has hkchhood L{H)S{ab,xy). Thus, 
the total hkehhood of the hypotheses in Cfj^ is L{H) 'YlxveH S{ab, xy). Since Cfj^ con- 
tains only one hypothesis in H„" (namely H), we have Qab{G{H)) = ^ \uhxv)- 

(Since one of the terms in this denominator is always S{ab, ab) which equals 1, so the 
denominator is always at least equal to 1.) 

We now consider the case oi H e H'^jf- Let H* = H - aa' - b'b + b'a' + ab, so 
H E C^i, (recall a' = adJH (a); see main text for definition of adJH (•) )■ 

Define the function 



S*(ab xy)=i ^ if(a-x)V(& = y) 

' S{ab,xy) otherwise 



We also define the following functions: 

S^°"'\ab,H)= Y^ S*{ab,xy) 

xyeH 



extra, „u Tj^ _ J S*{ab,b'a') if b'a' e A 



S'''""'{ab,H, ^ undefined if &'a' ^ A 

Given these definitions, we can conclude that if b'a' ^ A, then H* is undefined 
and Cab{H) = {H} (therefore 
Qab{C{H)^ ~ 0). Otherwise, we have: 

Qab{C{H*)) = ^ 



E(xy)eH*Siab,xy) 



1 

}^{xy)eH-aa'-b'b+b'a'+ab ^[O-b, Xy) 



1 + 2Z(xy)eH-aa'-b'b+b'a' S{ab,Xy) 



^ + E(xy)eH+b'a'S*iab,xy) 



1 + 5-*™(a6, H) + E(xy)eH S*iab, xy) 



1 + 5'=^*™(a6, H) + S^°^"-\ab, H) 



21 

Note that the above equation also applies to the case of if G Ti-ab ■ ^^ ^'^ define 
H* as equal to H for this ease, we have S*^^**"" (afe, H) — and thus the equation 
above holds. 

Appendix C. Complexity Analysis. 

Suppose that we have a set of samples where each sample differs by at most j 
changes (additions and deletions of ares) from the previous sample. The amount 
of computation required to compute the cluster probabilities for a sample depends 
on whether it is the first sample (therefore we must perform all computations from 
scratch) or a subsequent sample (in this case, we can use information from the previous 
sample). 

Additionally, the amount of computation required depends on how many target 
arcs we have. Since there is a different set of clusters for each target arc ab, we must 
keep track of 5**°*"' (a6, H) and 5"=^*'"" (a&, H) for each target arc ah separately. In 
many applications, we only have one or a small number of selected target arcs. In 
some applications, the target arcs are all the nonzero arcs in the best solution. In 
other applications, the target arcs are all the nonzero arcs in the problem. 

We let n be the maximum number of nodes in each frame and let k be the 
maximum degree of the nodes. 

Consider the case of the initial sample. There are at most n nonzero arcs in the 
sample. For each such arc xy, the only arcs ah such that S*{ah,xy) can possibly be 
nonzero are those with a adjacent to y and 6 adjacent to x. Since there are at most 
k possible choices for a and k possible choices for h, we can make this computation in 
0{k^) time, assuming that arcs can be accessed and their switch values computed in 
constant time. (Also, we can cache the computed list of S*{ah,xy) values in case we 
change the arc xy again in a later sample.) The computation of S''^^*'"" (a&, H) can be 
done in constant time for each arc ah. 

Thus, for the initial sample, we can compute the values of S*°*°'^(ah, H) for all the 
arcs by computing the values of S*{ah, xy) separately for each arc xy £ H. Since each 
arc xy takes 0{k^) time, the entire computation takes 0{nk'^) time. Computation of 
S'^^*'^'^{ab,H) takes constant time for each arc, or a total of 0{nk) time, so the full 
computation still only takes 0{nk^) time. If one is concerned only with the arcs in 
the best solution, then the only arcs ah that must be considered for any given xy are 
the arcs aa' where a is adjacent to y and a' is the node that a is associated with in the 
best solution, there are at most k such arcs, so the computation can be done in 0{nk) 
time. Similarly, the computation of S^^*'^'^{ah, H) adds only 0{n) time since there are 
at most n arcs in the best solution, so the total time is still 0{nk). For computation 
of the association probability of a single arc a6, we just need to enumerate all the arcs 
xy in the sample such that S*{ah,xy) ^ 0, and that can be done in 0(fc) time. 

For subsequent samples, when computing S*°*°'\ah,H), we can start with the 
previously computed totals and only need to update them by computing S*{ah,xy) 
for the xy that changed, and by assumption there are at most j of those changes. 
As before, computing all the S*{ah,xy) values for a single xy takes O(fc^) time if 
considering all arcs as possible values for ah, takes 0{k) time if considering only arcs 
in the best solution as possible values for ah, and takes constant time if considering 
only a single target arc ah. Additionally, the number of target arcs ah whose values of 
S^^*'^'^{ab,H) have changed (and thus need to be recomputed) are O(fc^) for the all- 
arcs case, 0{k) for the arcs-in-best-solution case, and constant time for the single-arc 
case. 

Appendix D. Computational Details Specific to Applications Presented. 



22 



Target Arcs 


First Sample Time 


Subsequent Sample Time 


All Arcs O(nfc^) 
Best Solution Arcs 0{nk) 
Single Arc 0{k) 

Table C.l 
Time complexity with nonzerc 


target 


0(jF) 
O(jfc) 
0(j) 

arcs. 



D.l. Local MTT Association Problem Details. To facilitate comparison 
with other computational approaches, we report the track and measurement ID num- 
bers identifying the particles shown in Fig. 1 of the main text. The 4x3 associa- 
tion problem shown was formulated using tracks IDs (193,194, 355,408) formed from 
frames 1-19 (formed using the tracking parameters indicated in the main text) with 
the particle measurement IDs (236,341,342) verified in frame 20. 

D.2. Stochastic Model. Note that we simultaneously estimate the drift (in- 
duced by confinement) as well as measurement and thermal noise from a single can- 
didate track (time spacing can be non- uniform). Accounting for the above factors 
explains why our (unpruned) estimated mean diffusion coefficient (0.0578 fim^/s) is 
slightly lower than reported in Ref. [7^. Our estimation procedure is outlined below. 
After tracks were formed we fit a stochastic differential equation (SDE) of the form: 



drt =^F[rt)dt + V2adBt (D.l) 

^t, =rt. + et, (D.2) 

where the two dimensional position of the particle measurements at time t is denoted 
by the vector rt = [xt,yt)^ ■ The subscript t is used to index time in the stochastic 
process. In the SDE above, Bt represents a standard 2D Brownian motion process. 



The notation dBt is used to denote an Ito 38 stochastic integral (similarly for drt). 



F(rt), is a vector representing the force associated with r^, $ is a matrix quantifying 
effective friction, and a is related to the local diffusion coefficient [32]. Stochastic 
effects inherent to SPT experiments, such as photon count uncertainty, prevent us 
from directly observing rt- The only quantity we can directly measure in the lab is 
denoted by tp- This process is also indexed by time, e.g. , 'ipa, but we assume that 
measurements are only available at discrete time points. The random variable tt^ 
represents one 3D measurement noise realization at time t^. 

We use the notation e^. ~ A/'(0, R) to convey that e^. is assumed to be distributed 
according to a mean zero multivariate Gaussian with covariance R. Decoupling mea- 
surement noise from thermal noise inherent to all SPT trajectories is made possible 
by applying classic filtering |39] along with modern SDE estimation 40] . These tech- 
niques ignore dt terms (i.e., the drift). Thermal and measurement noise estimates are 
then used to seed a Nelder-Mead algorithm (100 different initial guesses were used 
to help ensure that a local minimum was not encountered) searching for the global 



optimum parameter of the MLE (see 38 for cost function formulation) . 

The specific model we consider is a linear SDE parameterized by a finite dimen- 
sional parameter vector denoted by 9. The terms in Eqns. |D.l| and |D.2| are defined 
by the following equations: 
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$ =crcrT/fesT (D.3) 

F{r) =A + Br (D.4) 

a =C (D.5) 

e -7V(0, i?) (D.6) 

Given the definition above, the coUection of parameters to be estimated by the 
data can be written as 9 = {A,B,C',R). ^ is a 2D vector (i.e., A S M^). B, C, and 
R are 2x2 real matrices. The equations above were inspired by models in statistical 



physics 32 41 , e.g., ksT represents Boltzmann's constant multiplied by the system 



temperature. Also, we assume that the inertia of the particle can be ignored through 



a fluctuation dissipation relationship stated in Eqn. D.3 i.e., the tagged particle 



is in the "overdamped" regime ^32,. Equation D.4 defines the parametric form of 
the effective local linear force experienced by the overdamped particle and B can be 
interpreted as an elasticity parameter. 

D.3. Gap-closing Parameters Utilized. We used the default parameters in 
the publicly available code associated with Ref. IS]. However, we increased the 
IgapCloseParELm . timeWindow= 200 in order liberally allow for tracks to be stitched 
after quantum dot blinking (the UQ algorithm in conjunction with the Pgc thresh- 
olding employed prevented overly aggressively associating track segments). We also 
set gapCloseParam.mergeSplit= to suppress merging and splitting; this was only 
done to facilitate exposition, the tools introduced are readily applicable to scenarios 
including merge an split events in the gap-closing phase of track formation. 

Since our method also assumes that the cost matrix connecting objects in two 
adjacent frames is the negative of a log likelihood, we transformed the squared distance 
variable Sjj (see Eqns. 3-4 in Ref. [8]) to Sij. This transformation corresponds to the 
log likelihood of an exponential distribution with parameter one. The choice of which 
probability distribution is subjective m; regardless of what distribution one desires to 
use for gap-closing, the correspondence method can be used to assess the uncertainty. 

D.4. Spline fitting procedure. Each track had a single diffusion coefficient 
fit to the track data. The MLE estimate of the two-dimensional diffusion coefficient 
served as the "dependent variable" and the time averaged {x, y) position of each track 
served as the independent variables indexing the measurement (the individual tracks 
did not move far relative to inter-track variance). The default thin plate smoothing 



parameters of the package in Ref. 42 was used to fit the resulting scatterplot surface 



(generalized cross validation smoothing was used). 
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